Goto

Collaborating Authors

 Winnebago County


The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Neural Information Processing Systems

Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse -- a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.


Show and Tell: Prompt Strategies for Style Control in Multi-Turn LLM Code Generation

Bohr, Jeremiah

arXiv.org Artificial Intelligence

Language models generate functionally correct code that tends toward excessive verbosity, with elaborate documentation and defensive patterns that diverge from human baselines. Two prompting mechanisms have emerged for stylistic control: instruction based prompts that articulate abstract directives, and example based prompts that provide concrete code demonstrations. The core problem is whether stylistic constraints persist when models enhance initial implementations with additional features while maintaining high functional accuracy. Here we show that instruction-based, example-based, and combined prompts produce distinct patterns of initial control and expansion discipline over one enhancement turn. We manipulated system prompts across four conditions in a paired two-turn protocol where models first generated solutions to an intermediate Python task, then revised their code under general improvement directives, holding the user task fixed (N = 160 paired programs). Combined prompts produced the strongest initial compression and greatest expansion discipline. Instructions showed large initial effects and moderate expansion discipline. Examples showed modest initial effects with no expansion discipline. These results show that initial prompt effectiveness and expansion discipline are separate aspects of prompt design, and that combined approaches provide the most stable stylistic control in this two-turn workflow.


A VOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator

Neural Information Processing Systems

Finally, we implement a fully-integrated, closed-loop simulator of the vision-based detect-and-avoid problem to evaluate trained models with respect to the downstream collision avoidance task. This benchmark will enable further research in the design of robust machine learning systems for use in safety-critical applications.



A VOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator

Neural Information Processing Systems

Finally, we implement a fully-integrated, closed-loop simulator of the vision-based detect-and-avoid problem to evaluate trained models with respect to the downstream collision avoidance task. This benchmark will enable further research in the design of robust machine learning systems for use in safety-critical applications.


The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Kitouni, Ouail, Nolte, Niklas, Bouchacourt, Diane, Williams, Adina, Rabbat, Mike, Ibrahim, Mark

arXiv.org Artificial Intelligence

Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.


AVOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator

Smyers, Elysia Q., Katz, Sydney M., Corso, Anthony L., Kochenderfer, Mykel J.

arXiv.org Artificial Intelligence

Designing robust machine learning systems remains an open problem, and there is a need for benchmark problems that cover both environmental changes and evaluation on a downstream task. In this work, we introduce AVOIDDS, a realistic object detection benchmark for the vision-based aircraft detect-and-avoid problem. We provide a labeled dataset consisting of 72,000 photorealistic images of intruder aircraft with various lighting conditions, weather conditions, relative geometries, and geographic locations. We also provide an interface that evaluates trained models on slices of this dataset to identify changes in performance with respect to changing environmental conditions. Finally, we implement a fully-integrated, closed-loop simulator of the vision-based detect-and-avoid problem to evaluate trained models with respect to the downstream collision avoidance task. This benchmark will enable further research in the design of robust machine learning systems for use in safety-critical applications. The AVOIDDS dataset and code are publicly available at https://purl.stanford.edu/hj293cv5980 and https://github.com/sisl/VisionBasedAircraftDAA respectively.


Inferring Inference

Raju, Rajkumar Vasudeva, Li, Zhe, Linderman, Scott, Pitkow, Xaq

arXiv.org Artificial Intelligence

Patterns of microcircuitry suggest that the brain has an array of repeated canonical computational units. Yet neural representations are distributed, so the relevant computations may only be related indirectly to single-neuron transformations. It thus remains an open challenge how to define canonical distributed computations. We integrate normative and algorithmic theories of neural computation into a mathematical framework for inferring canonical distributed computations from large-scale neural activity patterns. At the normative level, we hypothesize that the brain creates a structured internal model of its environment, positing latent causes that explain its sensory inputs, and uses those sensory inputs to infer the latent causes. At the algorithmic level, we propose that this inference process is a nonlinear message-passing algorithm on a graph-structured model of the world. Given a time series of neural activity during a perceptual inference task, our framework finds (i) the neural representation of relevant latent variables, (ii) interactions between these variables that define the brain's internal model of the world, and (iii) message-functions specifying the inference algorithm. These targeted computational properties are then statistically distinguishable due to the symmetries inherent in any canonical computation, up to a global transformation. As a demonstration, we simulate recordings for a model brain that implicitly implements an approximate inference algorithm on a probabilistic graphical model. Given its external inputs and noisy neural activity, we recover the latent variables, their neural representation and dynamics, and canonical message-functions. We highlight features of experimental design needed to successfully extract canonical computations from neural data. Overall, this framework provides a new tool for discovering interpretable structure in neural recordings.


Doroni Aerospace Announces New Crowdfunding Campaign on StartEngine

#artificialintelligence

Doroni Aerospace, Inc. ("Doroni") announces the launch of their new crowdfunding campaign on the equity crowdfunding platform. The company previously closed their first Reg CF raise on StartEngine on April 29, 2022, having officially raised $1,069,850 from 916 investors. Now, the company has its sights set on a $2M offering max and is offering investors 50% Bonus Shares of Preferred Stock for the first 3 days the campaign is live as part of a limited-time, welcome back promotion. Doroni CEO/Founder Doron Merdinger is also inviting long-time supporters as well as new investors to join the team for an exclusive welcome back webinar on Wednesday, July 20th 3PM EST. Doron will be providing an overview of the company, current development progress of the H1 eVTOL, and will be answering questions in addition to offering a glimpse at what's next for the company.


People

#artificialintelligence

Problem decomposition and theory reformulation, integrated cognitive architectures for autonomous robots, distributed constraint satisfaction problems, semigroup theory and dynamical systems, category theory in software design. Interests include machine learning, approximation algorithms, on-line algorithms and planning systems. Calvin, William H. – Theoretical neurophysiologist and author of "The Cerebral Code", and "How Brains Think". Gesture and narrative language, animated agents, intonation, facial expression, computer vision. Intersection of computer science and game theory, computer science and economics, multiagent systems, automated negotiation and contracting.